Following 9/11 , anti-Muslim sentiments developed in the United States, where the public associated Islam with Terrorism (Machin, 2021). More recently, COVID-19 had sparked hate speech and violence against the Asian community, particularly on elderly Asians (Cabral, 2021).
To understand hate crimes in the United States, our Capstone Project seeks (1) to determine if discrete events are associated with increase in hate crimes, and (2) identify other possible drivers and moderators of hate crimes.
For this study and the models that follows, hate crimes are defined as “criminal offense[s] against a person or property motivated in whole or in part by an offender’s bias against a race, religion, disability, sexual orientation, ethnicity, gender, or gender identity” (FBI, 2019)
Using 9/11 as a case study, hate crimes motivated by racial/ethnic biases at the National Level from 1996-2006 are modeled graphically to explore possible causal relationship.
To tease out other drivers and/or moderators, we have identified 4 features to reflect the general environment (PEST Analysis):
These features are first analyzed graphically at the State level. We will validate the results further at the County level where these factors are less generalized and thus, more meaningful.
For our data science project, we activated the following packages, using the Tidyverse approach.
# Load necessary packages
pacman::p_load(tidyverse, ggplot2, ggrepel,dplyr, lubridate, rvest, glue, purrr, ggpubr, stargazer, reshape2, MASS, summarytools, pscl, urbnmapr)Then, we imported our datasets.
# Hate crimes motivated by racial/ethnic bias
setwd("/Users/alissao/Desktop/Capstone/Data")
FBI <- read_csv("FBI_US_hate_crime_1991_to_2019.csv")##
## -- Column specification --------------------------------------------------------
## cols(
## .default = col_character(),
## INCIDENT_ID = col_double(),
## DATA_YEAR = col_double(),
## ADULT_VICTIM_COUNT = col_logical(),
## JUVENILE_VICTIM_COUNT = col_logical(),
## TOTAL_OFFENDER_COUNT = col_double(),
## ADULT_OFFENDER_COUNT = col_logical(),
## JUVENILE_OFFENDER_COUNT = col_logical(),
## OFFENDER_ETHNICITY = col_logical(),
## VICTIM_COUNT = col_double(),
## TOTAL_INDIVIDUAL_VICTIMS = col_double()
## )
## i Use `spec()` for the full column specifications.
# Police enforcement data
police <- read_csv("https://s3-us-gov-west-1.amazonaws.com/cg-d4b776d0-d898-4153-90c8-8336f86bdfec/pe_1960_2019.csv")##
## -- Column specification --------------------------------------------------------
## cols(
## state_postal_abbr = col_character(),
## data_year = col_double(),
## officer_count = col_double(),
## officer_rate_per_1000 = col_double(),
## civilian_count = col_double(),
## civilian_rate_per_1000 = col_double(),
## population = col_double()
## )
# Political affiliation
electoral_2001 <- read_html("https://en.wikipedia.org/wiki/2000_United_States_presidential_election#Results_by_state") %>%
html_node("table.wikitable:nth-child(1)") %>%
html_table()
# Unemployment Level
setwd("/Users/alissao/Desktop/Capstone/BLS")
files <- dir(pattern = "*.csv")
BLS_labor <- files %>%
purrr::map(read_csv) %>%
purrr::reduce(rbind) %>%
rename(County_State = `County Name/State Abbreviation`)##
## -- Column specification --------------------------------------------------------
## cols(
## `LAUS Code` = col_character(),
## `State FIPS Code` = col_character(),
## `Country FIPS` = col_character(),
## `County Name/State Abbreviation` = col_character(),
## Year = col_double(),
## `Labor Force` = col_number(),
## Employed = col_number(),
## Unemployed = col_number(),
## `Unemployment Rate %` = col_double()
## )
##
## -- Column specification --------------------------------------------------------
## cols(
## `LAUS Code` = col_character(),
## `State FIPS Code` = col_character(),
## `Country FIPS` = col_character(),
## `County Name/State Abbreviation` = col_character(),
## Year = col_double(),
## `Labor Force` = col_number(),
## Employed = col_number(),
## Unemployed = col_number(),
## `Unemployment Rate %` = col_double()
## )
##
##
## -- Column specification --------------------------------------------------------
## cols(
## `LAUS Code` = col_character(),
## `State FIPS Code` = col_character(),
## `Country FIPS` = col_character(),
## `County Name/State Abbreviation` = col_character(),
## Year = col_double(),
## `Labor Force` = col_number(),
## Employed = col_number(),
## Unemployed = col_number(),
## `Unemployment Rate %` = col_double()
## )
##
##
## -- Column specification --------------------------------------------------------
## cols(
## `LAUS Code` = col_character(),
## `State FIPS Code` = col_character(),
## `Country FIPS` = col_character(),
## `County Name/State Abbreviation` = col_character(),
## Year = col_double(),
## `Labor Force` = col_number(),
## Employed = col_number(),
## Unemployed = col_number(),
## `Unemployment Rate %` = col_double()
## )
##
##
## -- Column specification --------------------------------------------------------
## cols(
## `LAUS Code` = col_character(),
## `State FIPS Code` = col_character(),
## `Country FIPS` = col_character(),
## `County Name/State Abbreviation` = col_character(),
## Year = col_double(),
## `Labor Force` = col_number(),
## Employed = col_number(),
## Unemployed = col_number(),
## `Unemployment Rate %` = col_double()
## )
##
## -- Column specification --------------------------------------------------------
## cols(
## `LAUS Code` = col_character(),
## `State FIPS Code` = col_double(),
## `Country FIPS` = col_double(),
## `County Name/State Abbreviation` = col_character(),
## Year = col_double(),
## `Labor Force` = col_number(),
## Employed = col_number(),
## Unemployed = col_number(),
## `Unemployment Rate %` = col_double()
## )
##
##
## -- Column specification --------------------------------------------------------
## cols(
## `LAUS Code` = col_character(),
## `State FIPS Code` = col_double(),
## `Country FIPS` = col_double(),
## `County Name/State Abbreviation` = col_character(),
## Year = col_double(),
## `Labor Force` = col_number(),
## Employed = col_number(),
## Unemployed = col_number(),
## `Unemployment Rate %` = col_double()
## )
##
## -- Column specification --------------------------------------------------------
## cols(
## `LAUS Code` = col_character(),
## `State FIPS Code` = col_character(),
## `Country FIPS` = col_character(),
## `County Name/State Abbreviation` = col_character(),
## Year = col_double(),
## `Labor Force` = col_number(),
## Employed = col_number(),
## Unemployed = col_number(),
## `Unemployment Rate %` = col_double()
## )
##
##
## -- Column specification --------------------------------------------------------
## cols(
## `LAUS Code` = col_character(),
## `State FIPS Code` = col_character(),
## `Country FIPS` = col_character(),
## `County Name/State Abbreviation` = col_character(),
## Year = col_double(),
## `Labor Force` = col_number(),
## Employed = col_number(),
## Unemployed = col_number(),
## `Unemployment Rate %` = col_double()
## )
##
##
## -- Column specification --------------------------------------------------------
## cols(
## `LAUS Code` = col_character(),
## `State FIPS Code` = col_character(),
## `Country FIPS` = col_character(),
## `County Name/State Abbreviation` = col_character(),
## Year = col_double(),
## `Labor Force` = col_number(),
## Employed = col_number(),
## Unemployed = col_number(),
## `Unemployment Rate %` = col_double()
## )
##
##
## -- Column specification --------------------------------------------------------
## cols(
## `LAUS Code` = col_character(),
## `State FIPS Code` = col_character(),
## `Country FIPS` = col_character(),
## `County Name/State Abbreviation` = col_character(),
## Year = col_double(),
## `Labor Force` = col_number(),
## Employed = col_number(),
## Unemployed = col_number(),
## `Unemployment Rate %` = col_double()
## )
# Racial Diversity
NCES_raw <- read_html("https://nces.ed.gov/pubs2010/2010015/tables/table_1a.asp") %>%
html_node("div.nces:nth-child(8) > div:nth-child(5) > table:nth-child(1) > tbody:nth-child(2)") %>%
html_table()Dataset on hate crime statistics were joined with police enforcement, political affiliation and racial diversity.
## Hate crimes motivated by racial/ethnic bias at National Level ##
hatecrimes <- FBI %>%
filter(BIAS_DESC == c("Anti-Black or African American",
"Anti-Jewish",
"Anti-White",
"Anti-Hispanic or Latino",
"Anti-Asian",
"Anti-Multiple Races, Group",
"Anti-Islamic (Muslim)",
"Anti-American Indian or Alaska Native",
"Anti-Arab"))
hatecrimes_tbl <- hatecrimes %>%
mutate(DATA_MONTH = month(dmy(INCIDENT_DATE))) %>%
group_by(DATA_YEAR,
DATA_MONTH,
BIAS_DESC) %>%
count(.)
## Hate crimes at State Level
AntiMuslim_hate <- FBI %>%
filter(DATA_YEAR %in% 2001,
BIAS_DESC == c("Anti-Islamic (Muslim)",
"Anti-Arab")) %>%
mutate(DATA_MONTH = month(dmy(INCIDENT_DATE))) %>%
group_by(DATA_YEAR, DATA_MONTH, STATE_ABBR) %>%
count(.) Adding Population Data using tidycensus::
library(tidycensus)
tidycensus::census_api_key(key ="9d7c9014a994e4afe55199e8d786e66f398ae251", install = T, overwrite = T)## Your original .Renviron will be backed up and stored in your R HOME directory if needed.
## Your API key has been stored in your .Renviron and can be accessed by Sys.getenv("CENSUS_API_KEY").
## To use now, restart R or run `readRenviron("~/.Renviron")`
## [1] "9d7c9014a994e4afe55199e8d786e66f398ae251"
#For many tidycensus functions, you specify the different surveys in the following way:
#"acs5": 5-year ACS
#"acs1": 1-year ACS
#"sf1": Decennial census
#ID <- tidycensus::load_variables(year = 2001, dataset = "sf1", cache = TRUE)
vars_decennial <- c(totalPop = "P003001",
PopWhite = "P003003",
PopBlacknAfricanAmer = "P003004",
PopAmIndnAlask = "P003005",
PopAsian = "P003006",
PopPacificHawi = "P003007",
PopOthers = "P003008")
popDf <- tidycensus::get_decennial(
geography = "state",
variables = vars_decennial,
year = 2000
)## Getting data from the 2000 decennial Census
## Using Census Summary File 1
popDF_wider <- popDf %>%
pivot_wider(names_from = variable, values_from = value)
setwd("/Users/alissao/Desktop/Capstone/Data")
StateFIPS <- read_csv("State_abbv.csv")##
## -- Column specification --------------------------------------------------------
## cols(
## GEOID = col_character(),
## STATE = col_character(),
## STATE_ABBR = col_character()
## )
popDF_wider_StateFIPS <- inner_join(popDF_wider, StateFIPS, by = c("GEOID" = "GEOID"))[ ,-10]Merge dataset on hatecrimes with population census
hatecrime_Pop <- inner_join(AntiMuslim_hate, popDF_wider_StateFIPS, by = c("STATE_ABBR" = "STATE_ABBR"))
hatecrime_Pop <- hatecrime_Pop %>%
mutate(hatecrime_per_1000 = n/totalPop*1000)
AntiMuslim_hate_byPop <- hatecrime_Pop[ ,c(1:3,14)]Police Enforcement DF
## Police enforcement data ##
police <- police %>%
rename(DATA_YEAR = "data_year",
STATE_ABBR = "state_postal_abbr")
# Join Tables
joined_police <- inner_join(AntiMuslim_hate_byPop, police, by = c("DATA_YEAR" = "DATA_YEAR", "STATE_ABBR" = "STATE_ABBR"))
# Spilt into quantiles for police per 1000
ApplyQuantile <- function(x) {
cut(x, breaks=c(quantile(x, probs = seq(0, 1, by = 0.25))),
labels=c("0-25","25-50","50-75","75-100"), include.lowest = TRUE)
}
joined_police$Quantile <- ApplyQuantile(joined_police$officer_rate_per_1000)Political Affliation DF
## Political affiliation ##
US_states <- electoral_2001 %>%
dplyr::select(c(28,30)) %>%
mutate(State = ifelse(Margin > 0, "Republican State", "Democratic State")) %>%
rename(STATE_ABBR = `State Total`)
# Join Tables
joined_politicalaf <- inner_join(AntiMuslim_hate_byPop, US_states, by = c("STATE_ABBR" = "STATE_ABBR"))Labor DF
## Unemployment Level ##
BLS_labor_tidy <- BLS_labor %>%
rename(DATA_YEAR = Year) %>%
mutate(STATE_ABBR = str_split_fixed(BLS_labor$`County_State`, ", ", 2)[,2]) %>%
group_by(DATA_YEAR, STATE_ABBR) %>%
summarise(UnemploymentRate = mean(`Unemployment Rate %`)) %>%
drop_na()## `summarise()` has grouped output by 'DATA_YEAR'. You can override using the `.groups` argument.
# Join Tables
joined_labor <- inner_join(AntiMuslim_hate_byPop, BLS_labor_tidy, by = c("DATA_YEAR" = "DATA_YEAR", "STATE_ABBR" = "STATE_ABBR"))
joined_labor$Quantile <- ApplyQuantile(joined_labor$UnemploymentRate)## Diversity Level ##
USCensus <- popDF_wider_StateFIPS %>%
mutate(DATA_YEAR = 2001,
PopNonWhite = sum(PopBlacknAfricanAmer, PopAmIndnAlask, PopAsian, PopPacificHawi, PopOthers),
RatioNonWhite = PopNonWhite/totalPop)
# Join Tables
joined_diversity <- inner_join(AntiMuslim_hate_byPop, USCensus, by = c("DATA_YEAR" = "DATA_YEAR", "STATE_ABBR" = "STATE_ABBR"))
joined_diversity$Quantile <- ApplyQuantile(as.numeric(joined_diversity$RatioNonWhite))Analysis of Main effect: Using ggplot2, we plotted graph showing incidents of hate crimes pre/post 9/11 by ethnic group, and by years. It is notable that after the occurrence of the terror attacks in 2001, there was a spike in hate crimes reported against Muslim and Arabs as compared to the other ethnic groups (Fig1). We also did a check for other years if the September spike is “seasonal” for anti-Muslim/Arab hate crimes, but this clearly only happened in 2001 (Fig2).
# Check other groups if a similar spike happened in Sept 2001
Fig1 <- hatecrimes_tbl %>%
filter(DATA_YEAR %in% 2001:2001) %>%
ggplot(aes(x = DATA_MONTH, y = n, group = BIAS_DESC)) +
geom_point(aes(color = as.factor(BIAS_DESC)), alpha = 0.5) +
geom_smooth(method = "loess", formula = y ~ x, se = F,
aes(color = as.factor(BIAS_DESC))) +
labs(title = "Hate Crime Incidents in US by Religious Bias in 2001",
subtitle = "Spike on 9/11 2001 against Muslims and Arabs but not in other groups",
caption = "Source: FBI",
x = "Month",
y = "Number of Incidents",
color = "Bias Description") +
scale_x_continuous(breaks = 1:12) +
ggthemes::theme_economist()+
theme(plot.title = element_text(size=12, face="bold"),
plot.subtitle = element_text(size=11, margin = margin(t=3, r=0, b=10, l=0), hjust=0),
legend.position = "right",
legend.title = element_text(size = 10, face="bold"),
legend.text = element_text(size = 8),
axis.text.x = element_text(size=8),
axis.title.x = element_text(size=8, face="bold", margin = margin(t=3, r=0, b=0, l=0)),
axis.text.y = element_text(size=8),
axis.title.y = element_text(size=8, face="bold", margin = margin(t=0, r=5, b=0, l=0)),
plot.caption = element_text(size=6, hjust=1))+
annotate("rect",
ymin = 0,
ymax = 40,
xmin = 8.5,
xmax = 9.5,
fill = "blue",
alpha = 0.15)
Fig1# Check other years if September spike is "seasonal" for anti-Muslim and anti-Arab hate crimes
Fig2 <- hatecrimes_tbl %>%
filter(DATA_YEAR %in% 1999:2003, (BIAS_DESC == "Anti-Islamic (Muslim)" || BIAS_DESC == "Anti-Arab")) %>%
ggplot(aes(x = DATA_MONTH, y = n, group = DATA_YEAR)) +
geom_point(aes(color = as.factor(DATA_YEAR)), alpha = 0.5) +
geom_smooth(method = "loess", formula = y ~ x, se = F,
aes(color = as.factor(DATA_YEAR))) +
labs(title = "Hate Crime Incidents in US against Muslims and Arabs (1999 - 2003)",
subtitle = "Spike on 9/11 2001 but not in other years",
caption = "Source: FBI",
x = "Month",
y = "Number of Incidents",
color = "Year") +
scale_x_continuous(breaks = 1:12) +
ggthemes::theme_economist()+
theme(plot.title = element_text(size=12, face="bold"),
plot.subtitle = element_text(size=11, margin = margin(t=3, r=0, b=10, l=0), hjust=0),
legend.position = "right",
legend.title = element_text(size = 10, face="bold"),
legend.text = element_text(size = 8),
axis.text.x = element_text(size=8),
axis.title.x = element_text(size=8, face="bold", margin = margin(t=3, r=0, b=0, l=0)),
axis.text.y = element_text(size=8),
axis.title.y = element_text(size=8, face="bold", margin = margin(t=0, r=5, b=0, l=0)),
plot.caption = element_text(size=6, hjust=1))+
annotate("rect",
ymin = 0,
ymax = 40,
xmin = 8.5,
xmax = 9.5,
fill = "blue",
alpha = 0.15)
Fig2We then explored the other features while keeping the unit of analysis at the State Level.
mod_police_revised <- joined_police %>%
filter(DATA_YEAR %in% 2001) %>%
ggplot(aes (x = DATA_MONTH, y = hatecrime_per_1000, group = Quantile)) +
geom_point() +
geom_smooth(method = "loess",
se = F,
formula = y ~ x,
aes(color = Quantile)) +
labs(title = "Effects of police enforcement level on \nMuslim/Arab hate crimes during 9-11",
subtitle = "Higher hate crimes reported in States with higher police enforcement",
caption = "Source: FBI",
x = "Month",
y = "Muslim/Arab Hate Crime Incidents per 1000",
color = "Officer Rate Per 1000") +
scale_x_continuous(breaks = 1:12) +
ggthemes::theme_economist() +
theme(plot.title = element_text(size=12, face="bold"),
plot.subtitle = element_text(size=11, margin = margin(t=3, r=0, b=10, l=0), hjust=0),
legend.position = "right",
legend.title = element_text(size = 10, face="bold"),
legend.text = element_text(size = 8),
axis.text.x = element_text(size=8),
axis.title.x = element_text(size=8, face="bold", margin = margin(t=3, r=0, b=0, l=0)),
axis.text.y = element_text(size=8),
axis.title.y = element_text(size=8, face="bold", margin = margin(t=0, r=5, b=0, l=0)),
plot.caption = element_text(size=6, hjust=1))
mod_police_revisedmod_politicalaf_revised <- joined_politicalaf %>%
filter(DATA_YEAR %in% 2001) %>%
ggplot(aes (x = DATA_MONTH, y = hatecrime_per_1000, group = State)) +
geom_point() +
geom_smooth(method = "loess",
se = F,
formula = y ~ x,
aes(color = State)) +
scale_color_manual(values = c("Blue", "Red"))+
labs(title = "Effects of political affiliation on \nMuslim/Arab hate crimes during 9-11",
subtitle = "Higher hate crimes reported in Democrat States",
caption = "Source: FBI",
x = "Month",
y = "Muslim/Arab Hate Crime Incidents per 1000",
color = "Political Affiliation(2001)") +
scale_x_continuous(breaks = 1:12) +
ggthemes::theme_economist() +
theme(plot.title = element_text(size=12, face="bold"),
plot.subtitle = element_text(size=11, margin = margin(t=3, r=0, b=10, l=0), hjust=0),
legend.position = "right",
legend.title = element_text(size = 10, face="bold"),
legend.text = element_text(size = 8),
axis.text.x = element_text(size=8),
axis.title.x = element_text(size=8, face="bold", margin = margin(t=3, r=0, b=0, l=0)),
axis.text.y = element_text(size=8),
axis.title.y = element_text(size=8, face="bold", margin = margin(t=0, r=5, b=0, l=0)),
plot.caption = element_text(size=6, hjust=1))
mod_politicalaf_revisedmod_labor_revised <- joined_labor %>%
filter(DATA_YEAR %in% 2001) %>%
ggplot(aes (x = DATA_MONTH, y = hatecrime_per_1000, group = Quantile)) +
geom_point() +
geom_smooth(method = "loess",
se = F,
formula = y ~ x,
aes(color = Quantile)) +
labs(title = "Effects of unemployment on \nMuslim/Arab hate crimes during 9-11",
subtitle = "States with lower unemployment experienced more hate crimes during 9-11",
caption = "Source: BLS",
x = "Month",
y = "Muslim/Arab Hate Crime Incidents per 1000",
color = "Unemployment Rate") +
scale_x_continuous(breaks = 1:12) +
ggthemes::theme_economist() +
theme(plot.title = element_text(size=12, face="bold"),
plot.subtitle = element_text(size=11, margin = margin(t=3, r=0, b=10, l=0), hjust=0),
legend.position = "right",
legend.title = element_text(size = 10, face="bold"),
legend.text = element_text(size = 8),
axis.text.x = element_text(size=8),
axis.title.x = element_text(size=8, face="bold", margin = margin(t=3, r=0, b=0, l=0)),
axis.text.y = element_text(size=8),
axis.title.y = element_text(size=8, face="bold", margin = margin(t=0, r=5, b=0, l=0)),
plot.caption = element_text(size=6, hjust=1))
mod_labor_revisedmod_diversity_revised <- joined_diversity %>%
filter(DATA_YEAR %in% 2001) %>%
ggplot(aes (x = DATA_MONTH, y = hatecrime_per_1000, group = Quantile)) +
geom_point() +
geom_smooth(method = "loess",
se = F,
formula = y ~ x,
aes(color = Quantile)) +
labs(title = "Effects of Racial Diversity on \nMuslim/Arab hate crimes during 9-11",
subtitle = "States between 25-50th percentile of non-white experienced the highest hate crimes",
caption = "US Census Data",
x = "Month",
y = "Muslim/Arab Hate Crime Incidents per 1000",
color = "Ratio of Non White") +
scale_x_continuous(breaks = 1:12) +
ggthemes::theme_economist() +
theme(plot.title = element_text(size=12, face="bold"),
plot.subtitle = element_text(size=11, margin = margin(t=3, r=0, b=10, l=0), hjust=0),
legend.position = "right",
legend.title = element_text(size = 10, face="bold"),
legend.text = element_text(size = 8),
axis.text.x = element_text(size=8),
axis.title.x = element_text(size=8, face="bold", margin = margin(t=3, r=0, b=0, l=0)),
axis.text.y = element_text(size=8),
axis.title.y = element_text(size=8, face="bold", margin = margin(t=0, r=5, b=0, l=0)),
plot.caption = element_text(size=6, hjust=1))
mod_diversity_revisedTypically, more police officers will deter crimes. However the above observation shows otherwise. This paradoxical observation had also been studied in past literature, where FBI’s method of data collection were examined.The accuracy of bias-crime reporting were at times compromised due to the officer’s ability to identify the right type of crime motivation (Cronin, McDevitt, Farrell and Nolan, 2007). For this reason, some researchers like in Scheuerman’s 2019 study on hate crimes against LGBT interpret the numbers more about the police units’ ability or willingness to report because they have the resources rather than as an indicator of effectiveness in crime prevention or police enforcement.
The three other factors similarly suggested some moderation effects and we will discuss in more detail the possible rationale for each when we compare similarities or differences with our county-level analyses. For now, our plots suggest that (a) Democratic states tend to report more hate crimes than Republican states, (b) lower unemployment is associated with higher hate crimes while (c) states with higher ratios of non-white tend to have lower hate crimes.
To further tease out these variables, we shifted our unit of analysis at the County level and used FBI’s latest available data to hopefully avoid some issues raised about county-level crime data.
Maltz and Targonski (2002) found when they analysed the FBI data between 1980 and 1992 that county-level crime data have major gaps and the imputation schemes for filling the gaps are inadequate and inconsistent. However, at the time of their writing in 2002 the FBI was already developing new methods to improve the old imputation procedures so we hope that we are using a more accurate data from its most recent 6 years (2014 - 2019).
We considered the same features as with our state-level analysis except for police enforcement which is not available at the County level:
To understand the factors that specifically affect the reporting of hate crimes, we control for a range of variables that may positively affect the occurrence of crime in general (see Shaw and McKay 1942; Felson 1994; Jenness and Grattet 1996; McVeigh, Welch, and Bjarnason 2003; Stotzer 2010a). We included serious crimes per 1,000 (to account for possibility of overlaps between hate crime and other forms of violent offending) and population density (population per square mile of the county’s land area).
We imported our hate crime dataset from www.icpsr.umich.edu as this version of the FBI hate crime dataset is mapped to counties. Violent crime data between 2017 to 2019 were also consolidated.
# hate crimes from 2013 to latest
ucr_hate_crimes_2013_2019 <- read_csv("C:\\Users\\alissao\\Desktop\\Capstone\\Data\\ucr_hate_crimes_1991_2019.csv") %>%
filter(year %in% c(2013:2019))##
## -- Column specification --------------------------------------------------------
## cols(
## .default = col_double(),
## ori = col_character(),
## ori9 = col_character(),
## hate_crime_incident_present = col_character(),
## state = col_character(),
## state_abb = col_character(),
## incident_date = col_date(format = ""),
## month = col_character(),
## day_of_week = col_character(),
## agency_name = col_character(),
## city_name = col_character(),
## population_group = col_character(),
## country_division = col_character(),
## country_region = col_character(),
## core_city = col_character(),
## covered_by_ori = col_character(),
## judicial_district = col_character(),
## agency_nibrs_flag = col_character(),
## agency_inactive_date = col_logical(),
## date_ori_was_added = col_date(format = ""),
## date_ori_went_nibrs = col_date(format = "")
## # ... with 60 more columns
## )
## i Use `spec()` for the full column specifications.
# hate crimes during TRUMP's term (3 years only due to data availability)
HC_2017_2019_county <- ucr_hate_crimes_2013_2019 %>%
filter(year %in% c(2017:2019) & !is.na(incident_date) & population != 0) %>%
group_by(county_fips = fips_state_county_code) %>%
summarise(incidents = n())
# violent crimes during TRUMP's term (3 years only due to data availability)
VC_2017_county <- read_csv("C:\\Users\\alissao\\Desktop\\Capstone\\Data\\offenses_known_monthly_2017.csv")[ , c(6,14,26,83)] %>%
filter(population != 0) %>%
rename(county_fips = 'fips_state_county_code',
total_crimes = 'actual_index_violent') %>%
mutate(county_fips = sprintf("%05d", county_fips),
crime_per1k = total_crimes / population * 1000) %>%
dplyr::select(county_fips, crime_per1k) %>%
group_by(county_fips) %>%
summarize(crime_per1k = sum(crime_per1k))##
## -- Column specification --------------------------------------------------------
## cols(
## .default = col_double(),
## ori = col_character(),
## ori9 = col_character(),
## agency_name = col_character(),
## state = col_character(),
## state_abb = col_character(),
## month = col_character(),
## date = col_character(),
## last_month_reported = col_character(),
## agency_type = col_character(),
## crosswalk_agency_name = col_character(),
## census_name = col_character(),
## population_group = col_character(),
## country_division = col_character(),
## core_city_indication = col_character(),
## followup_indication = col_character(),
## covered_by_ori = col_character(),
## special_mailing_group = col_character(),
## special_mailing_address = col_character(),
## first_line_of_mailing_address = col_character(),
## second_line_of_mailing_address = col_character()
## # ... with 11 more columns
## )
## i Use `spec()` for the full column specifications.
VC_2018_county <- read_csv("C:\\Users\\alissao\\Desktop\\Capstone\\Data\\offenses_known_monthly_2018.csv")[ , c(6,16,28,98)] %>%
filter(population != 0) %>%
rename(county_fips = 'fips_state_county_code',
total_crimes = 'actual_index_violent') %>%
mutate(county_fips = sprintf("%05d", county_fips),
crime_per1k = total_crimes / population * 1000) %>%
dplyr::select(county_fips, crime_per1k) %>%
group_by(county_fips) %>%
summarize(crime_per1k = sum(crime_per1k))##
## -- Column specification --------------------------------------------------------
## cols(
## .default = col_double(),
## ori = col_character(),
## ori9 = col_character(),
## agency_name = col_character(),
## state = col_character(),
## state_abb = col_character(),
## month = col_character(),
## date = col_character(),
## arson_last_month_reported = col_character(),
## last_month_reported = col_character(),
## agency_type = col_character(),
## crosswalk_agency_name = col_character(),
## census_name = col_character(),
## population_group = col_character(),
## country_division = col_character(),
## juvenile_age = col_logical(),
## core_city_indication = col_character(),
## followup_indication = col_character(),
## covered_by_ori = col_character(),
## special_mailing_group = col_character(),
## special_mailing_address = col_character()
## # ... with 13 more columns
## )
## i Use `spec()` for the full column specifications.
VC_2019_county <- read_csv("C:\\Users\\alissao\\Desktop\\Capstone\\Data\\offenses_known_monthly_2019.csv")[ , c(6,16,28,98)] %>%
filter(population != 0) %>%
rename(county_fips = 'fips_state_county_code',
total_crimes = 'actual_index_violent') %>%
mutate(county_fips = sprintf("%05d", county_fips),
crime_per1k = total_crimes / population * 1000) %>%
dplyr::select(county_fips, crime_per1k) %>%
group_by(county_fips) %>%
summarize(crime_per1k = sum(crime_per1k))##
## -- Column specification --------------------------------------------------------
## cols(
## .default = col_double(),
## ori = col_character(),
## ori9 = col_character(),
## agency_name = col_character(),
## state = col_character(),
## state_abb = col_character(),
## month = col_character(),
## date = col_character(),
## arson_last_month_reported = col_character(),
## last_month_reported = col_character(),
## agency_type = col_character(),
## crosswalk_agency_name = col_character(),
## census_name = col_character(),
## population_group = col_character(),
## country_division = col_character(),
## juvenile_age = col_logical(),
## core_city_indication = col_character(),
## followup_indication = col_character(),
## covered_by_ori = col_character(),
## special_mailing_group = col_character(),
## special_mailing_address = col_character()
## # ... with 13 more columns
## )
## i Use `spec()` for the full column specifications.
VC_2017_2019_county <- rbind(VC_2017_county,
VC_2018_county,
VC_2019_county) %>%
filter(county_fips != ' NA') %>%
dplyr::select(county_fips, crime_per1k) %>%
group_by(county_fips) %>%
summarize(crime_per1k = sum(crime_per1k))We tidied the hate crime, county demographics and election datasets. In the US, each county has a standard code called FIPS and this is what we used as key to merging the different datasets into one.
# get all counties and their FIPS code from urbnmapr package of Urban Institute
library(urbnmapr)
counties <- urbnmapr::counties #get from built-in datasets in urbanmapr package
counties_coord <- counties
FIPS <- counties %>%
group_by(state_name, county_name, county_fips) %>%
count(nrow(.)) %>%
mutate(county_state = paste0(county_name, ", ", state_name))
FIPS <- FIPS[-c(1:2,4:5)]
# read county demographics datasets
foreign_born <- read_csv("C:\\Users\\alissao\\Desktop\\Capstone\\Data\\ACSDT5Y2019.B05002_data_with_overlays_2021-06-15T052041.csv")##
## -- Column specification --------------------------------------------------------
## cols(
## .default = col_character()
## )
## i Use `spec()` for the full column specifications.
foreign_born <- foreign_born[-c(1), c(2, 3, 27)] %>%
filter(!str_detect(NAME, paste("Puerto Rico"))) %>%
mutate(perc_foreign_born = as.numeric(B05002_013E) / as.numeric(B05002_001E)) %>%
rename(county_state = 'NAME',
population = 'B05002_001E') %>%
dplyr::select(county_state, population, perc_foreign_born)
unemployed <- read_csv("C:\\Users\\alissao\\Desktop\\Capstone\\Data\\ACSDT5Y2019.B12006_data_with_overlays_2021-06-15T014309.csv")##
## -- Column specification --------------------------------------------------------
## cols(
## .default = col_character()
## )
## i Use `spec()` for the full column specifications.
unemployed <- unemployed[-c(1), c(2, 3, 9, 13, 19, 23, 31, 35, 41, 45, 53, 57, 63, 67, 75, 79, 85, 89, 97, 101, 107, 111)] %>%
filter(!str_detect(NAME, paste("Puerto Rico"))) %>%
mutate_at(c(2:12), as.numeric) %>%
mutate(perc_unemployed = (as.numeric(B12006_006E) + as.numeric(B12006_011E) + as.numeric(B12006_017E) + as.numeric(B12006_022E) + as.numeric(B12006_028E) + as.numeric(B12006_033E) + as.numeric(B12006_039E) + + as.numeric(B12006_050E) + as.numeric(B12006_055E)) /
(as.numeric(B12006_004E) + as.numeric(B12006_009E) + as.numeric(B12006_015E) + as.numeric(B12006_020E) + as.numeric(B12006_026E) + as.numeric(B12006_031E) + as.numeric(B12006_037E) + as.numeric(B12006_042E) + as.numeric(B12006_048E) + as.numeric(B12006_053E))) %>%
rename(county_state = 'NAME') %>%
dplyr::select(county_state, perc_unemployed)
poverty <- read_csv("C:\\Users\\alissao\\Desktop\\Capstone\\Data\\ACSDT5Y2019.C17002_data_with_overlays_2021-06-15T051215.csv")##
## -- Column specification --------------------------------------------------------
## cols(
## GEO_ID = col_character(),
## NAME = col_character(),
## C17002_001E = col_character(),
## C17002_001M = col_character(),
## C17002_002E = col_character(),
## C17002_002M = col_character(),
## C17002_003E = col_character(),
## C17002_003M = col_character(),
## C17002_004E = col_character(),
## C17002_004M = col_character(),
## C17002_005E = col_character(),
## C17002_005M = col_character(),
## C17002_006E = col_character(),
## C17002_006M = col_character(),
## C17002_007E = col_character(),
## C17002_007M = col_character(),
## C17002_008E = col_character(),
## C17002_008M = col_character()
## )
poverty <- poverty[-c(1), c(2, 5, 7)] %>%
filter(!str_detect(NAME, paste("Puerto Rico"))) %>%
mutate_at(c(2:3), as.numeric) %>%
mutate(below_poverty = (C17002_002E + C17002_003E)) %>%
rename(county_state = 'NAME') %>%
dplyr::select(county_state, below_poverty)
nonwhite <- read_csv("C:\\Users\\alissao\\Desktop\\Capstone\\Data\\ACSDT5Y2019.B03002_data_with_overlays_2021-06-15T051604.csv")##
## -- Column specification --------------------------------------------------------
## cols(
## .default = col_character()
## )
## i Use `spec()` for the full column specifications.
nonwhite <- nonwhite[-c(1), c(2, 3, 7)] %>%
filter(!str_detect(NAME, paste("Puerto Rico"))) %>%
mutate_at(c(2:3), as.numeric) %>%
mutate(perc_nonwhite = 1 - (B03002_003E / B03002_001E)) %>%
rename(county_state = 'NAME') %>%
dplyr::select(county_state, perc_nonwhite)
population_density <- read_csv("C:\\Users\\alissao\\Desktop\\Capstone\\Data\\Average_Household_Size_and_Population_Density_-_County.csv")[ , c(3, 16)] %>%
rename(county_fips = 'GEOID',
pop_density = 'B01001_calc_PopDensity')##
## -- Column specification --------------------------------------------------------
## cols(
## .default = col_double(),
## COUNTYNS = col_character(),
## GEOID = col_character(),
## NAME = col_character(),
## State = col_character(),
## created_user = col_character(),
## created_date = col_character(),
## last_edited_user = col_character(),
## last_edited_date = col_character()
## )
## i Use `spec()` for the full column specifications.
# Merge all and join with FIPS
US_demographics <- foreign_born %>%
inner_join(unemployed, by = "county_state") %>%
inner_join(poverty, by = "county_state") %>%
inner_join(nonwhite, by = "county_state") %>%
mutate_at(c(2:5), as.numeric) %>%
mutate(perc_poverty = below_poverty / population) %>%
dplyr::select(county_state, population, perc_foreign_born, perc_poverty, perc_nonwhite, perc_unemployed)
US_demographics <- FIPS %>% left_join(US_demographics, by = "county_state")
# merge US_demographics and county_hate_crimes_2017_2019
US_demographics_HC <- US_demographics %>%
left_join(HC_2017_2019_county, by = "county_fips") %>%
left_join(population_density, by = "county_fips") %>%
left_join(VC_2017_2019_county, by = "county_fips")
#fill NA with zeroes (i.e. no hate crimes recorded)
US_demographics_HC[is.na(US_demographics_HC)] = 0
# load election data from MIT (https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/VOQCHQ)
election_county <- read_csv("C:\\Users\\alissao\\Desktop\\Capstone\\Data\\countypres_2000-2016.csv") ##
## -- Column specification --------------------------------------------------------
## cols(
## year = col_double(),
## state = col_character(),
## state_po = col_character(),
## county = col_character(),
## FIPS = col_double(),
## office = col_character(),
## candidate = col_character(),
## party = col_character(),
## candidatevotes = col_double(),
## totalvotes = col_double(),
## version = col_double()
## )
election_county_2016 <- election_county %>%
filter(year == 2016) %>%
group_by(FIPS) %>%
mutate(percentvotes = round(candidatevotes / totalvotes, 2),
FIPS = sprintf("%05d", FIPS)) %>% #pad FIPS with leading zeroes for merge
dplyr::select(FIPS, party, percentvotes) %>%
replace_na(list(party = "other")) %>%
rename(county_fips = 'FIPS') %>%
drop_na() %>%
pivot_wider(names_from = party, values_from = percentvotes)
# convert class list into numeric using a function otherwise it leads to "(list) object cannot be coerced..")
election_county_2016 <- mutate_all(election_county_2016, function(x) as.numeric(as.character(x))) %>%
mutate(margin_repub = republican - democrat - other) %>%
drop_na()## `mutate_all()` ignored the following grouping variables:
## Column `county_fips`
## Use `mutate_at(df, vars(-group_cols()), myoperation)` to silence the message.
election_county_2016 <- election_county_2016[-c(2:4)]
#merge with hate crime data
US_demographics_HC <- US_demographics_HC %>% inner_join(election_county_2016, by = "county_fips") %>%
filter(population > 0)For the preparation of the model, we checked for multicollinearity by running a correlation matrix to see how our variables of interest (within the model) are related.
corr_matrix <- US_demographics_HC[, -c(1:2)] %>%
as.matrix(.) %>%
Hmisc::rcorr(.) %>%
broom::tidy(.) %>%
filter(p.value < 0.05 & estimate > 0.40) %>%
print(n = nrow(.))## # A tibble: 9 x 5
## column1 column2 estimate n p.value
## <chr> <chr> <dbl> <int> <dbl>
## 1 perc_foreign_born population 0.480 3112 0
## 2 perc_nonwhite perc_foreign_born 0.529 3112 0
## 3 perc_nonwhite perc_poverty 0.423 3112 0
## 4 perc_unemployed perc_poverty 0.649 3112 0
## 5 perc_unemployed perc_nonwhite 0.424 3112 0
## 6 incidents population 0.663 3112 0
## 7 pop_density incidents 0.475 3112 0
## 8 crime_per1k population 0.732 3112 0
## 9 crime_per1k incidents 0.522 3112 0
The highest correlation is .732 between crime rate (crime_per1k) and population. There is nothing to eliminate from our variables because no pair is higher than .80.
Next, we evaluated the distribution of our data. Knowing that hate crimes are count data, this is likely a Poisson distribution. Analyzing count data using ordinary least squares regression may produce improbable predicted values, and as a result of regression assumption violations. Count data are optimally analyzed using Poisson-based regression techniques such as Poisson or negative binomial regression (Huang, 2020).
We explored various models to evaluate how they fit our data.
First, we looked at frequency distribution:
# Frequency distribution (top 10 rows only)
freq(US_demographics_HC$incidents, rows = 1:10)## Frequencies
## US_demographics_HC$incidents
## Type: Integer
##
## Freq % Valid % Valid Cum. % Total % Total Cum.
## ------------- ------ --------- -------------- --------- --------------
## 0 1777 57.10 57.10 57.10 57.10
## 1 376 12.08 69.18 12.08 69.18
## 2 191 6.14 75.32 6.14 75.32
## 3 130 4.18 79.50 4.18 79.50
## 4 92 2.96 82.46 2.96 82.46
## 5 72 2.31 84.77 2.31 84.77
## 6 46 1.48 86.25 1.48 86.25
## 7 41 1.32 87.56 1.32 87.56
## 8 34 1.09 88.66 1.09 88.66
## 9 38 1.22 89.88 1.22 89.88
## (Other) 315 10.12 100.00 10.12 100.00
## <NA> 0 0.00 100.00
## Total 3112 100.00 100.00 100.00 100.00
From the results, 57.1% of the counties have zero hate crime. Comparing this with the predicted probability based on the mean,
mean(US_demographics_HC$incidents)^0 * exp(-1* mean(US_demographics_HC$incidents)) / factorial(0) ## [1] 0.001424429
it is only 0.14% which is quite different from the mean of 57.1%. Obviously, our frequency distribution looked like this:
fr <- table(US_demographics_HC$incidents) %>% data.frame
names(fr) <- c('incidents', 'freq')
fr$incidents <- as.numeric(as.character(fr$incidents)) #convert factor to numeric
ggplot(fr, aes(x = incidents, y = freq)) +
geom_col(color = "blue", fill = "blue") +
theme_bw() +
lims(y = c(0, 400)) labs(x = "Number of Hate Crimes", y = "Frequency") +
theme(axis.line = element_line(color = "black"),
panel.border = element_blank())## NULL
We ran OLS, Poisson, Negative Binomial and Zero-Inflated Binomial regressions to compare the model-data fit graphically and validate it by comparing their respective Akaike Information Criterian (AIC) values:
linear <- glm(incidents ~ perc_unemployed + perc_poverty + perc_nonwhite + perc_foreign_born + margin_repub + pop_density + crime_per1k, data = US_demographics_HC) #linear, OLS
pois <- glm(incidents ~ perc_unemployed + perc_poverty + perc_nonwhite + perc_foreign_born + margin_repub + pop_density + crime_per1k, data = US_demographics_HC, family = "poisson"(link = "log")) #Poisson
negb <- glm.nb(incidents ~ perc_unemployed + perc_poverty + perc_nonwhite + perc_foreign_born + margin_repub + pop_density + crime_per1k, data = US_demographics_HC) #negative binomial
zinb <- zeroinfl(incidents ~ perc_unemployed + perc_poverty + perc_nonwhite + perc_foreign_born + margin_repub + pop_density + crime_per1k, data = US_demographics_HC, dist = "negbin") #zero inflated nb
library(jtools)
options(scipen = 999)
summary(linear)##
## Call:
## glm(formula = incidents ~ perc_unemployed + perc_poverty + perc_nonwhite +
## perc_foreign_born + margin_repub + pop_density + crime_per1k,
## data = US_demographics_HC)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -372.76 -4.09 0.86 4.03 974.87
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.5245880 1.9204160 -0.794 0.427324
## perc_unemployed -11.0098915 30.0849519 -0.366 0.714420
## perc_poverty -4.8297115 13.3350356 -0.362 0.717241
## perc_nonwhite -14.8866502 4.4155498 -3.371 0.000757 ***
## perc_foreign_born 57.5583394 13.3703613 4.305 0.0000172 ***
## margin_repub -4.2224186 2.1961476 -1.923 0.054616 .
## pop_density 0.0249992 0.0008583 29.125 < 0.0000000000000002 ***
## crime_per1k 0.2886209 0.0083278 34.658 < 0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 959.0911)
##
## Null deviance: 5585631 on 3111 degrees of freedom
## Residual deviance: 2977019 on 3104 degrees of freedom
## AIC: 30208
##
## Number of Fisher Scoring iterations: 2
summary(pois)##
## Call:
## glm(formula = incidents ~ perc_unemployed + perc_poverty + perc_nonwhite +
## perc_foreign_born + margin_repub + pop_density + crime_per1k,
## family = poisson(link = "log"), data = US_demographics_HC)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -25.139 -2.229 -1.527 -0.381 57.323
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.262542565 0.022560654 100.29 <0.0000000000000002 ***
## perc_unemployed 10.221228333 0.498763864 20.49 <0.0000000000000002 ***
## perc_poverty -7.677760464 0.203204011 -37.78 <0.0000000000000002 ***
## perc_nonwhite -1.137290654 0.062194578 -18.29 <0.0000000000000002 ***
## perc_foreign_born 5.497596786 0.109845438 50.05 <0.0000000000000002 ***
## margin_repub -2.748569488 0.027091836 -101.45 <0.0000000000000002 ***
## pop_density 0.000070303 0.000001493 47.10 <0.0000000000000002 ***
## crime_per1k 0.001005360 0.000010902 92.22 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 101399 on 3111 degrees of freedom
## Residual deviance: 42951 on 3104 degrees of freedom
## AIC: 47404
##
## Number of Fisher Scoring iterations: 6
summary(negb)##
## Call:
## glm.nb(formula = incidents ~ perc_unemployed + perc_poverty +
## perc_nonwhite + perc_foreign_born + margin_repub + pop_density +
## crime_per1k, data = US_demographics_HC, init.theta = 0.2984154439,
## link = log)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -4.8503 -0.9869 -0.8185 0.0062 4.9005
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.04491635 0.12440290 8.399 < 0.0000000000000002 ***
## perc_unemployed 6.27711477 2.03253871 3.088 0.00201 **
## perc_poverty -4.18569249 0.90436287 -4.628 0.0000036862101713 ***
## perc_nonwhite -2.26182541 0.29246647 -7.734 0.0000000000000105 ***
## perc_foreign_born 10.00352423 0.84159652 11.886 < 0.0000000000000002 ***
## margin_repub -2.02486871 0.14180386 -14.279 < 0.0000000000000002 ***
## pop_density 0.00042985 0.00005099 8.429 < 0.0000000000000002 ***
## crime_per1k 0.01504465 0.00049672 30.288 < 0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(0.2984) family taken to be 1)
##
## Null deviance: 4701.6 on 3111 degrees of freedom
## Residual deviance: 2590.6 on 3104 degrees of freedom
## AIC: 11259
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 0.2984
## Std. Err.: 0.0116
## Warning while fitting theta: alternation limit reached
##
## 2 x log-likelihood: -11240.9240
summary(zinb)##
## Call:
## zeroinfl(formula = incidents ~ perc_unemployed + perc_poverty + perc_nonwhite +
## perc_foreign_born + margin_repub + pop_density + crime_per1k, data = US_demographics_HC,
## dist = "negbin")
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## -0.70497 -0.46911 -0.31340 -0.05928 20.81180
##
## Count model coefficients (negbin with log link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.55530463 0.16111239 9.654 < 0.0000000000000002 ***
## perc_unemployed 3.56705734 3.03430507 1.176 0.239764
## perc_poverty -3.40159241 1.15155942 -2.954 0.003138 **
## perc_nonwhite -1.25794941 0.32499085 -3.871 0.000109 ***
## perc_foreign_born 8.33625976 1.17491674 7.095 0.00000000000129 ***
## margin_repub -1.82684096 0.16118352 -11.334 < 0.0000000000000002 ***
## pop_density 0.00019519 0.00001027 19.009 < 0.0000000000000002 ***
## crime_per1k 0.00820774 0.00098554 8.328 < 0.0000000000000002 ***
## Log(theta) -0.69919178 0.05175602 -13.509 < 0.0000000000000002 ***
##
## Zero-inflation model coefficients (binomial with logit link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.17206 0.31389 3.734 0.000188 ***
## perc_unemployed -0.15545 4.57090 -0.034 0.972870
## perc_poverty 0.44636 2.13780 0.209 0.834610
## perc_nonwhite 2.36855 0.65437 3.620 0.000295 ***
## perc_foreign_born -6.34886 1.88387 -3.370 0.000751 ***
## margin_repub 0.19345 0.33184 0.583 0.559918
## pop_density -0.07687 0.01188 -6.470 0.0000000000978 ***
## crime_per1k -0.04263 0.00870 -4.900 0.0000009602534 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Theta = 0.497
## Number of iterations in BFGS optimization: 59
## Log-likelihood: -5430 on 17 Df
#select the model based on AIC values
tmp <- data.frame(OLS = AIC(linear), Poisson = AIC(pois),
Negative.Binomial = AIC(negb), Zero.Inflated.Binomial = AIC(zinb))
tmp## OLS Poisson Negative.Binomial Zero.Inflated.Binomial
## 1 30208.41 47403.73 11258.92 10894.75
Based on the above AIC results, Zero-Inflated Binomial turns out to have the best AIC (i.e. lowest) while Negative Binomial came in a close second.
Using the predprob function in the pscl package, we plotted the predicted probabilities, this time excluding OLS as it is obviously not a good fit.
po.pois <- predprob(pois) %>% colMeans
po.negb <- predprob(negb) %>% colMeans
po.zinb <- predprob(zinb) %>% colMeans
df <- data.frame(x = 0:max(US_demographics_HC$incidents), Poisson = po.pois,
NegBin = po.negb, Zinb = po.zinb)
obs <- table(US_demographics_HC$incidents) %>% prop.table() %>% data.frame #Observed
obs <- obs[1:20, ]
names(obs) <- c("x", 'Observed')
obs$x <- as.numeric(as.character(obs$x))
mm1 <- melt(obs, id.vars = 'x', value.name = 'prob', variable.name = 'Model')
df <- df[1:20, ]
mm2 <- melt(df, id.vars = 'x', value.name = 'prob', variable.name = 'Model')
ggplot() +
geom_line(data = mm1, aes(lty = Model, x = x, y = prob, group = Model, col = Model), lwd = 3) +
geom_line(data = mm2, aes(lty = Model, x = x, y = prob, group = Model, col = Model), lwd = 1) +
theme_bw() +
labs(x = "Number of Hate Crimes", y = 'Probability',
title = "Models for No. of Hate Crimes",
subtitle = "Both Zero-Inflated Binomial and Negative Binomial fit our observed data well") +
scale_color_manual(values = c('red', 'black', 'blue', 'green')) +
scale_linetype_manual(values = c('dashed', 'solid', 'dashed', 'dashed')) +
theme(legend.position = c(.75, .65),
plot.title=element_text(face="bold")) #plot.subtitle=element_text(hjust = 0, margin = margin(t=3, r=0, b=10, l=0)))Both the zero-inflated binomial and negative binomial models fitted our observed data closely well. Using the AIC as the basis, we selected zero-inflated binomial as it has the lowest AIC among the models.
Our model can then be expressed in the form:
\[ \begin{eqnarray} ln(\widehat{incident}) = intercept + b_1perc\_poverty + b_2perc\_nonwhite + b_3perc\_foreignborn + b_4margin\_repub + b_5popdensity + b_6crimeper1kpop + \epsilon \end{eqnarray} \]
Therefore,
\[ \begin{eqnarray} \widehat{incident} = e^{intercept}e^{b_1perc\_poverty}e^{b_2perc\_nonwhite}e^{b_3perc\_foreignborn}e^{b_4margin\_repub}e^{b_5popdensity}e^{b_6crimeper1kpop} \end{eqnarray} \]
The summary report for our zero-inflated negative binomial model is shown below:
summary(zinb, confint = T)##
## Call:
## zeroinfl(formula = incidents ~ perc_unemployed + perc_poverty + perc_nonwhite +
## perc_foreign_born + margin_repub + pop_density + crime_per1k, data = US_demographics_HC,
## dist = "negbin")
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## -0.70497 -0.46911 -0.31340 -0.05928 20.81180
##
## Count model coefficients (negbin with log link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.55530463 0.16111239 9.654 < 0.0000000000000002 ***
## perc_unemployed 3.56705734 3.03430507 1.176 0.239764
## perc_poverty -3.40159241 1.15155942 -2.954 0.003138 **
## perc_nonwhite -1.25794941 0.32499085 -3.871 0.000109 ***
## perc_foreign_born 8.33625976 1.17491674 7.095 0.00000000000129 ***
## margin_repub -1.82684096 0.16118352 -11.334 < 0.0000000000000002 ***
## pop_density 0.00019519 0.00001027 19.009 < 0.0000000000000002 ***
## crime_per1k 0.00820774 0.00098554 8.328 < 0.0000000000000002 ***
## Log(theta) -0.69919178 0.05175602 -13.509 < 0.0000000000000002 ***
##
## Zero-inflation model coefficients (binomial with logit link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.17206 0.31389 3.734 0.000188 ***
## perc_unemployed -0.15545 4.57090 -0.034 0.972870
## perc_poverty 0.44636 2.13780 0.209 0.834610
## perc_nonwhite 2.36855 0.65437 3.620 0.000295 ***
## perc_foreign_born -6.34886 1.88387 -3.370 0.000751 ***
## margin_repub 0.19345 0.33184 0.583 0.559918
## pop_density -0.07687 0.01188 -6.470 0.0000000000978 ***
## crime_per1k -0.04263 0.00870 -4.900 0.0000009602534 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Theta = 0.497
## Number of iterations in BFGS optimization: 59
## Log-likelihood: -5430 on 17 Df
In the summary report below the model call shows a block of output containing negative binomial regression coefficients for each of the variables along with standard errors, z-scores, and p-values for the coefficients. A second block follows that showing the inflation model. This includes logit coefficients for predicting excess zeros (i.e. non-occurrence of hate crime in a county in our case) along with their standard errors, z-scores, and p-values.
All of the predictors except perc_unemployed are statistically significant. This model fits the data significantly better than the null model, i.e., the intercept-only model. To show that this is the case, we can compare with the current model to a null model without predictors using chi-squared test on the difference of log likelihoods.
zinb0 <- update(zinb, . ~ 1)
pchisq(2 * (logLik(zinb) - logLik(zinb0)), df = 3, lower.tail = FALSE) ## 'log Lik.' 0 (df=17)
The statistically significant result with seventeen degrees of freedom strongly suggests that the zero-inflated negative binomial model fits the data better than the intercept-only model.
Note that the model output above does not indicate in any way if our zero-inflated model is an improvement over a standard negative binomial regression. We can determine this by performing a Vuong test of the two models.
pscl::vuong(zinb, negb)## Vuong Non-Nested Hypothesis Test-Statistic:
## (test-statistic is asymptotically distributed N(0,1) under the
## null that the models are indistinguishible)
## -------------------------------------------------------------
## Vuong z-statistic H_A p-value
## Raw 9.418137 model1 > model2 < 0.000000000000000222
## AIC-corrected 9.021765 model1 > model2 < 0.000000000000000222
## BIC-corrected 7.824124 model1 > model2 0.0000000000000025535
We can see that our test statistic is significant, indicating that our zero-inflated model is better than the negative binomial regression model.
The summary report for our zero-inflated negative binomial model is again shown below, followed by the exponentiated coefficients to derive the incident rates for our interpretation
summary(zinb, confint = T)##
## Call:
## zeroinfl(formula = incidents ~ perc_unemployed + perc_poverty + perc_nonwhite +
## perc_foreign_born + margin_repub + pop_density + crime_per1k, data = US_demographics_HC,
## dist = "negbin")
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## -0.70497 -0.46911 -0.31340 -0.05928 20.81180
##
## Count model coefficients (negbin with log link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.55530463 0.16111239 9.654 < 0.0000000000000002 ***
## perc_unemployed 3.56705734 3.03430507 1.176 0.239764
## perc_poverty -3.40159241 1.15155942 -2.954 0.003138 **
## perc_nonwhite -1.25794941 0.32499085 -3.871 0.000109 ***
## perc_foreign_born 8.33625976 1.17491674 7.095 0.00000000000129 ***
## margin_repub -1.82684096 0.16118352 -11.334 < 0.0000000000000002 ***
## pop_density 0.00019519 0.00001027 19.009 < 0.0000000000000002 ***
## crime_per1k 0.00820774 0.00098554 8.328 < 0.0000000000000002 ***
## Log(theta) -0.69919178 0.05175602 -13.509 < 0.0000000000000002 ***
##
## Zero-inflation model coefficients (binomial with logit link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.17206 0.31389 3.734 0.000188 ***
## perc_unemployed -0.15545 4.57090 -0.034 0.972870
## perc_poverty 0.44636 2.13780 0.209 0.834610
## perc_nonwhite 2.36855 0.65437 3.620 0.000295 ***
## perc_foreign_born -6.34886 1.88387 -3.370 0.000751 ***
## margin_repub 0.19345 0.33184 0.583 0.559918
## pop_density -0.07687 0.01188 -6.470 0.0000000000978 ***
## crime_per1k -0.04263 0.00870 -4.900 0.0000009602534 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Theta = 0.497
## Number of iterations in BFGS optimization: 59
## Log-likelihood: -5430 on 17 Df
est <- cbind(Estimate = coef(zinb), confint(zinb))
exp(est)## Estimate 2.5 % 97.5 %
## count_(Intercept) 4.73652919 3.45399022759 6.49530175
## count_perc_unemployed 35.41223369 0.09254397842 13550.59850039
## count_perc_poverty 0.03332017 0.00348737001 0.31835842
## count_perc_nonwhite 0.28423628 0.15033039416 0.53741802
## count_perc_foreign_born 4172.45451874 417.15809044583 41733.28315966
## count_margin_repub 0.16092112 0.11733116772 0.22070527
## count_pop_density 1.00019521 1.00017507674 1.00021534
## count_crime_per1k 1.00824152 1.00629585299 1.01019094
## zero_(Intercept) 3.22862650 1.74516409465 5.97309394
## zero_perc_unemployed 0.85602835 0.00011008459 6656.55888585
## zero_perc_poverty 1.56260779 0.02366756321 103.16833601
## zero_perc_nonwhite 10.68192207 2.96244799199 38.51661177
## zero_perc_foreign_born 0.00174874 0.00004356806 0.07019113
## zero_margin_repub 1.21342781 0.63321756082 2.32527828
## zero_pop_density 0.92600637 0.90469222832 0.94782266
## zero_crime_per1k 0.95826928 0.94206777302 0.97474943
Based on the direction of the coefficients in the model and the exponentiated values, the count model at the first block suggests that for counties with hate crime incidents:
On the other hand, the zero-inflation model at the second block suggests that for counties that have zero hate crimes:
The positive relationship of foreign-born residents and hate crime supports existing literature on collective efficacy which suggests that this may exist because of the inability of residents to communicate and form strong bonds to prevent crime and disorder (Kornhauser, 1978). Note that to date, heterogeneity has not been consistently found to share a significant relationship with hate crime incidence (Freilich et al., 2014; Gladfelter et al., 2017; Grattet, 2009; Lyons, 2007) but our study shows the contrary.
Like poverty, the negative relationship of non-white population and hate crimes are consistent with a similar study by Holder (2018) which focused on 3 counties only using a different hate crime data source and time frame (2010-2016).
Our control variables population density and violent crime rate both showed a positive and significant association with hate crimes. This is consistent with other studies and controlling for them allowed us to isolate the effects of our predictor variables on hate crime.
The relationship of Republican votes margin and hate crime came as a surprise as it suggests that the stronger a county’s support on Republicans, the lower the hate crimes. Nevertheless, this insight supports an existing research on LGBT crimes by Scheuerman, Parris, Faupel and Werum (2019) who interpreted it as an indication of willingness to report in Democrat states rather than occurrence of hate crimes.
Does this suggest then that because Trump had won several states that had voted Democratic for decades (Bump, 2016), there were lesser hate crimes reported during Trump’s term? To find out, we explored this from a different lens by looking at geographical ‘scope’ of hate crimes (not just the number of incidents) during two presidential terms, Obama (Democrat) and Trump (Republican).
We compared the counties with at least 1 hate crime during Obama’s last 3 years (2014-2016) and Trump’s first 3 years (2017-2019). We also compared the total incidents during these periods.
# Trump's 1st 3 years in office (due to limited data in FBI database)
HC_2017_2019 <- ucr_hate_crimes_2013_2019 %>%
filter(year %in% c(2017:2019) & !is.na(incident_date)) %>%
group_by(county_fips = fips_state_county_code) %>%
summarise(incidents = n())
incidents_trump_term <- sum(HC_2017_2019$incidents)
# Obama's last 3 years in office to be comparable while avoiding any gap year
HC_2014_2016 <- ucr_hate_crimes_2013_2019 %>%
filter(year %in% c(2014:2016) & !is.na(incident_date)) %>%
group_by(county_fips = fips_state_county_code) %>%
summarise(incidents = n())
incidents_obama_term <- sum(HC_2014_2016$incidents)
# Differences between the 2 terms
HC_new <- dplyr::anti_join(HC_2017_2019, HC_2014_2016, by = "county_fips") %>%
mutate(Crimes = "new")
counties_diff <- length(HC_new$county_fips)
incidents_diff <- incidents_trump_term - incidents_obama_term
HC_common <- dplyr::inner_join(HC_2017_2019, HC_2014_2016, by = "county_fips") %>%
rename(incidents_Trump = "incidents.x",
incidents_Obama = "incidents.y") %>%
mutate(incidents = incidents_Trump - incidents_Obama,
Crimes = if_else(incidents > 0, "increased", if_else(incidents < 0, "reduced", "zero"))) %>%
dplyr::select(county_fips, incidents, Crimes)
HC_diff <- rbind(HC_new, HC_common)
# merge with counties data to get latitude and longitude for mapping purposes then plot
HC_diff <- counties_coord %>% left_join(HC_diff, by = "county_fips") %>%
mutate(County = paste(county_name, ", ", state_abbv))
# fill NA with 0
HC_diff$incidents[is.na(HC_diff$incidents)] = 0
# fill NA with "zero"
HC_diff$Crimes[is.na(HC_diff$Crimes)] = "zero"plot_HC_diff <- HC_diff %>%
dplyr::select(long, lat, county_fips, county_name, state_abbv, incidents, Crimes, state_name) %>%
mutate(county = paste(county_name, ", ", state_abbv),
region = state_name) %>%
dplyr::select(long, lat, county_fips, county, incidents, Crimes, region) %>%
group_by(county_fips, county, Crimes, region) %>%
summarize(long = mean(long),
lat = mean(lat),
incidents = mean(incidents)) %>%
mutate(hate_crimes = if_else(Crimes == "new", paste(toString(incidents), " but none in Obama's term"),
if_else(Crimes == "increased", paste("+", toString(incidents), " vs Obama's term"),
toString(incidents))
)
)## `summarise()` has grouped output by 'county_fips', 'county', 'Crimes'. You can override using the `.groups` argument.
viz <- ggplot() +
geom_polygon(data = urbnmapr::states, mapping = aes(x = long, y = lat, group = group),
fill = "cadetblue", color = "white", size = .2) +
coord_map(projection = "albers", lat0 = 39, lat1 = 45) +
geom_point(data = filter(plot_HC_diff, Crimes == "new"),
aes(x = long, y = lat, cases = hate_crimes, where = county),
inherit.aes = FALSE,
color = "#a80000", size = 1.5, alpha = 0.7) +
geom_point(data = filter(plot_HC_diff, Crimes == "increased"),
aes(x = long, y = lat, cases = hate_crimes, where = county),
inherit.aes = FALSE,
color = "black", size = 1.5, alpha = 0.4) +
labs(title = "Hate Crimes per County, Trump's First 3 years (2017-2019)",
subtitle = "",
caption = "Source: FBI Uniform Crime Reporting Data, \nJacob Kaplan's Concatenated files, www.openicpsr.org",
fill = "Hate Crimes per County") +
theme(axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.title.y = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
legend.title = element_text(size = 8),
legend.text = element_text(size = 8),
plot.title = element_text(face = "bold"),
panel.background = element_rect(color = "white", fill = "white"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank())
plotly::ggplotly(viz, tooltip = c("cases", "where")) %>%
plotly::layout(title = list(text = paste0('<b>Hate Crimes per County, Trump\'s First 3 years (2017-2019)</b>','<br>','Net increase of 4,968 incidents and 431 more counties than Obama\'s term')))#setwd("/Users/alissao/Desktop/Capstone/Data")
#save.image("DASR2021_Capstone_Alissa_Thomas_LATEST.RData")Trump’s first three years registered a net increase of 4,968 hate crimes compared to Obama’s last 3 years in office. In addition, 431 more counties reported at least 1 hate crime but these counties had zero incidents during Obama’s term.
It is clear that the scope of hate crimes widened during Trump’s term. This may be due to his and the Republican party’s strong anti-immigration policies. Trump has attacked and scapegoated immigrants in ways that previous presidents never have — and in the process, he has spread more fear, resentment and hatred of immigrants than any American in history (Washington Post, 2019).
This phenomenon was also observed back in 1994 after the passage of Proposition 187, an anti-immigration reform bill. The Los Angeles County Commission on Human Relations documented a 23.5% increase in hate crimes agains Latinos in 1994, and attributed the rise largely to anti-immigrant sentiment. During the eleven months since the passage of 187, there were over 1,000 inquiries and complaints reported to CHIRLA (Cervantes, 1995)
In summary, our findings in this data science project are as follows:
The first part of our study seeks to graphically explore if causal relationship exists between events and hate crimes. It appears that after 9/11, there was a spike in Muslim/Arab hate crimes reported. This suggests – at least graphically – that 9/11 seems to have a direct influence towards the increase in Muslim/Arab hate crimes in the United States. However, as this only explores 9/11 event as a case study, it is not a representation between all other events and hate crimes.
Based on existing literatures, there are considerable evidence that suggests that police enforcement, political affiliation, unemployment and racial diversity do affect the volume of hate crimes (FBI, 2011). Hence, we explored the influence of the four moderators with hate crimes during 2001. We found visual evidence that some of these variables could have an influence on hate crimes during 9/11, i.e. (a) Democratic states tend to report more hate crimes than Republican states, (b) lower unemployment is associated with higher hate crimes while (c) states with higher ratios of non-white tend to have lower hate crimes post 9/11.
The second part of the study seeks to further confirm our state-level findings quantitatively using regression analysis. For our regression model, a smaller unit of analysis (County Level) was used, and the study extended to hate crimes in general rather than focusing on only 9/11. One difference of our country level findings is the significant but negative relationship between percentage of non-white and hate crimes. This supports a similar research by Holder (2018) which focused on 3 counties only using a different hate crime data source (survey-based) and time frame (2010-2016). An earlier work by Lyons (2007) found that communities characterized by racial homogeneity were increasingly associated with anti-Black crimes. Messner’s (2018) study also highlighted that the relative group size of black population in a county has a negative and significant impact on the anti-Black targeting rates. These might explain why the smaller the proportion of non-white, the more hate crimes are commmited.
Another measure of diversity that we included in our county-level analysis turned out to have the biggest and positive relationship on hate crimes: percentage of foreign-born residents. An earlier study supports our findings and associates this to inability of residents to communicate and form strong bonds to prevent crime and disorder (Kornhauser, 1978). This positive relationship also manifested between population density and violent crime rates with hate crime.
We found that the smaller the percentage of the population below poverty line at the county level, the higher the hate crimes. This is also consistent with Holder’s (2018) study. Resentment increases against those ‘other’ people who perpertators fear are doing better than them.
Both level of analysis consistently suggest that the stronger a county’s support on Republicans, the lower the hate crimes. A state-level study by Soule & Earl (2001) found that wealthier states with Democrat-dominated legislatures have higher hate crime adoption, thereby translating into better reporting.
Nevertheless, we wondered if this also means that because Trump had won several states that had voted Democratic for decades (Bump, 2016), Trump’s term would see a decline in hate crimes. We teased this out graphically by comparing not just the volume but how wide the scope of hate crime was during the respective terms of Obama and Trump. It turns out that, in fact, Trump’s term saw 431 more counties that reported at least 1 hate crime while registering a net increase of 4,290 hate crimes overall compared to Obama’s last 3 years in office.
Our study suggests that we can be more sensitive to the environment or scenarios that can influence hate crimes, and thus introduce preventive countermeasures to target specifically the root of the problem.
These insights are particularly important for politicians and policy-makers in governments as these can inform data-driven decisions on zoning policies, multi-racial awareness programs, etc. While the focus of our study is the United States, it is highly likely that the predictors and dynamics are fundamentally similar. In concrete terms our insights suggest that to minimize the chance of hate crime occurrence, government leaders should carefully balance the influx of foreigners and increase in population while curbing violent crimes. Moreover, ethnic quotas per geographic area may help as well because it prevents racial homogeneity which is a predictor of hate crimes.
Finally, it is our hope that this study will trigger some reflections among politicians and even the general public to take hate crimes seriously and take actions to prevent its proliferation.
This Capstone Project originally sought to understand causal relationship between Asian hate crimes and the Covid-19 pandemic ‘event’ but the FBI hate crime database coverage is until 2019 only. This would have allowed us to use 9/11 and Covid-19 as two major events and conduct a causal DiD analysis. We hope to continue this study on Covid-19 pandemic when data is available in the future.
The association between 9/11 and hate crimes were studied using graphical estimation by DiD analysis as an observational study intended to establish possible causal relationship. While we were able to demonstrate this graphically, this approach does provide relative impact of each variable, unlike what quantitative models can offer.
While doing this study, we were hoping to explore other features such as leaders’ sentiments (e.g Trump vs. Obama) on Twitter and its relationship with hate crimes, given several news articles suggesting that Trump’s inflammatory messages and racist terms such as “China virus” and “Kung Flu” saw a spike in Google searches (Sherman, 2021). Our project time is limited to exhaustively cover these potential predictors.
We are unsure if the more recent county-level data from the FBI are indeed more reliable than the older datasets. However, as we do not have other extensive data sources this is still the best we have. As all our county-level insights are supported by earlier studies - some of which either used survey-based data, focused on a different bias group or used a different time periods -it suggests that our findings are nevertheless robust.
Machin, S., 2021. Hate Crime in the Wake of Terror Attacks: Evidence From 7/7 and 9/11 - Emma Hanes, Stephen Machin, 2014. [online] SAGE Journals. Available at: https://journals.sagepub.com/doi/10.1177/1043986214536665 [Accessed 15 June 2021].
Cabral, S., 2021. Covid ‘hate crimes’ against Asian Americans on rise. [online] BBC News. Available at: https://www.bbc.com/news/world-us-canada-56218684 [Accessed 15 June 2021].
Bureau of Labor Statistics, 2021. [online] Bureau of Labor Statistics. Available at: https://www.bls.gov/web/laus/laumstrk.htm [Accessed 14 June 2021].
Fairchild, A. and MacKinnon, D., 2008. A General Model for Testing Mediation and Moderation Effects. Prevention Science, [online] 10(2), pp.87-99. Available at: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2908713/ [Accessed 19 June 2021].
FBI, 2011. Variables affecting hate crimes. Hate Crime Statistics 2011. [online]
FBI. Available at: https://ucr.fbi.gov/hate-crime/2011/resources/variables-affecting-crime [Accessed 19 June 2021].
Sherman, A. (2021, March 21). Hate crimes against Asian Americans: What the numbers show, and don’t. Poynter.Org. https://www.poynter.org/fact-checking/2021/hate-crimes-against-asian-americans-what-the-numbers-show-and-dont/
Allison, P. (2019, November 27). Do We Really Need Zero-Inflated Models? | Statistical Horizons. Statistical Horizons | Statistics Training That Makes Sense. https://statisticalhorizons.com/zero-inflated-models
Messner, S. (2018). Steven Messner. Crime & Justice Research Alliance. https://crimeandjusticeresearchalliance.org/steven-messner-2/
Holder, Eaven, “Political Competition and Predictors of Hate Crime: A County-level Analysis” (2018). Electronic Theses and Dissertations. Paper 3491. https://dc.etsu.edu/etd/3491
Lyons, Christopher J. 2008. “Defending Turf: Racial Demographics and Hate Crime Against Blacks and Whites.” Social Forces 87 (1): 357–85.
Scheuerman, Heather L.; Parris, Christie L.; Faupel, Alison H.; and Werum, Regina E., “State-Level Determinants of Hate Crime Reporting: Examining the Impact of Structural and Social Movement Influences” (2020). Sociology Department, Faculty Publications. 713. https://digitalcommons.unl.edu/sociologyfacpub/713
Anbinder, T. (2019, November 7). Trump has spread more hatred of immigrants than any American in history. Washington Post. https://www.washingtonpost.com/outlook/trump-has-spread-more-hatred-of-immigrants-than-any-american-in-history/2019/11/07/7e253236-ff54-11e9-8bab-0fc209e065a8_story.html
Bump, P. (2016, November 15). The counties that flipped parties to swing the 2016 election. Washington Post. https://www.washingtonpost.com/news/the-fix/wp/2016/11/15/the-counties-that-flipped-parties-to-swing-the-2016-election/
MIT Election Data and Science Lab (2018). County Presidential Election Returns 2000-2020 [Data file]. Retrieved from https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/VOQCHQ
UCLA: Statistical Consulting Group. (n.d.). Negative Binomial Regression | R Data Analysis Examples. Https://Stats.Idre.Ucla.Edu/. Retrieved June 14, 2021, from https://stats.idre.ucla.edu/r/dae/negative-binomial-regression/
Soule, S., & Earl, J. (2001). THE ENACTMENT OF STATE-LEVEL HATE CRIME LAW IN THE UNITED STATES: INTRASTATE AND INTERSTATE FACTORS on JSTOR. Https://Www.Jstor.Org. https://www.jstor.org/stable/10.1525/sop.2001.44.3.281
Inter-university Consortium for Political and Social Research (ICPSR). (2019a, January 22). A Note on the Use of County-Level UCR Data. Https://Www.Openicpsr.Org/. https://www.openicpsr.org/openicpsr/project/108164/version/V5/view?path=/openicpsr/108164/fcr:versions/V5/Maltz---Targonski-2002-A-Note-on-the-Use-of-County-Level-UCR-Data.pdf
FBI Crime Data Explorer (2021). Hate Crime Statistics [Data file]. Retrieved from https://crime-data-explorer.app.cloud.gov/pages/downloads
FBI Crime Data Explorer (2021). Police Employee [Data file]. Retrieved from https://crime-data-explorer.app.cloud.gov/pages/downloads
United States Census (2021). Hispanic or Latino Origin by Race [Data file]. Retrieved from https://data.census.gov/cedsci/table?q=B03002&tid=ACSDT1Y2019.B03002
United States Census (2021). Place of Birth by Nativity and Citizenship Status [Data file]. Retrieved from https://data.census.gov/cedsci/table?q=B05002&tid=ACSDT1Y2019.B05002
United States Census (2021). Marital Status by Sex by Labor Force Participation [Data file]. Retrieved from https://data.census.gov/cedsci/table?q=B12006&tid=ACSDT1Y2019.B12006
United States Census (2021). Ratio of Income to Poverty Level in the Past 12 Months [Data file]. Retrieved from https://data.census.gov/cedsci/table?q=C17002&tid=ACSDT1Y2019.C17002
OPENICPSR (2021). Jacob Kaplan’s Concatenated Files: Uniform Crime Reporting (UCR) Program Data: Hate Crime Data 1991-2019 [Data file]. Retrieved from https://www.openicpsr.org/openicpsr/project/103500/version/V7/view
Average Household Size and Population Density - County. (2020, June 9). Https://Covid19.Census.Gov. https://covid19.census.gov/datasets/21843f238cbb46b08615fc53e19e0daf_1/about
Cronin, S., McDevitt, J., Farrell, A. and Nolan, J., 2007. Bias-Crime Reporting. American Behavioral Scientist, 51(2), pp.213-231.
Cervantes, N. (1995). Hate Unleashed: Los Angeles in the Aftermath of Proposition 187. Https://Escholarship.Org. https://escholarship.org/uc/item/1p41v152